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ABSTRACT 

We revisit the uncertainty in baryon acoustic oscillation (BAO) forecasts and data 
analyses. In particular, we study how much the uncertainties on both the measured 
mean dilation scale and the associated error bar are affected by the non-Gaussianity of 
the non-linear density field. We examine two possible impacts of non-Gaussian analy- 
sis: (I) we derive the distance estimators from Gaussian theory, but use 1000 N-Body 
simulations to measure the actual errors, and compare this to the Gaussian prediction, 
and (2) we compute new optimal estimators, which requires the inverse of the non- 
Gaussian covariance matrix of the matter power spectrum. Obtaining an accurate and 
precise inversion is challenging, and we opted for a noise reduction technique applied 
on the covariance matrices. By measuring the bootstrap error on the inverted matrix, 
this work quantifies for the first time the significance of the non-Gaussian error correc- 
tions on the BAO dilation scale. We find that the variance (error squared) on distance 
measurements can deviate by up to 12% between both estimators, an effect that re- 
quires a large number of simulations to be resolved. We next apply a reconstruction 
algorithm to recover some of the BAO signal that had been smeared by non-linear 
evolution, and we rerun the analysis. We find that after reconstruction, the rms error 
on the distance measurement improves by a factor of ~ 1.7 at low redshift (consistent 
with previous results), and the variance (a 2 ) shows a change of up to 18% between 
optimal and sub-optimal cases (note, however, that these discrepancies may depend in 
detail on the procedure used to isolate the BAO signal) . We finally discuss the impact 
of this work on current data analyses. 

Key words: Cosmology: observations — Dark energy — Large-scale structure of the 
Universe — Distance scale — Methods: statistical 



1 INTRODUCTION 

Ever since acoustic p eaks were detecte d in the cosmic mi- 
crowave background ([Miller et a3 [T999) and galaxy surveys 
ijEisenstein et alll2005l 'l. much effort has been devoted to use 
baryon acoustic oscillations (BAO) as standar d rulers to 
estim a te cosmological distances w i th precision jEisensteinl 
20051: iBlake fc Glazebrookl 120031; ISeo fc Eisensteinl 1200% 
Bassett fc Hlozekl 120091 : iMcDonald fe Eisensteinl 1200 it ). 
BAO signals are manifested as a wiggly feature in the 
matter power spectrum, and precise measurements could 
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shed light on the dynamics of dark energy. The ac- 
curacy and precision of such measurement depends di- 
rectly on the covariance matrix of the matter power spec- 
trum, which is straightforward to compute for a Gaus- 
sian density field using Wick's theorem. However, it is 
known that non -Gaussianities are significant in that co- 
variance matrix dMeiksin fc W hite 1999; S coccimarro et all 
ll999l : lRimes fc Hamiltonll2005l . 120061 : iTakahashi et alll2oTlh 



and may have an impact on cosmological distance measure- 
ments. A non-linear model is needed for a non-Gaussian cal- 
culation, hence some criterion for the accuracy of such ef- 
fects is needed to quantify the precision of the measurement. 
In particular, an optimal BAO measurement should in gen- 
eral incorporate a proper error weighting of the data, which 
involves the inversion of the full covariance matrix. The ac- 
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curacy criterion must therefore be based on the confidence 
we have on the inverted matrix. 

The estimator constructed in most forecasts and dat a 
analyses follow the prescription of ISeo fc Eisensteinl l|2007t ). 
which is a procedure that constructs both the estimator of 
the BAO scale and the estimator of its uncertainty under 
the Gaussian assumption. The method was originally cross- 
checked with a x 2 analysis of 51 simulations, including a 
jackknife sub-sampling of the N-body data, and showed good 
agreement. The actual deviation between this proposed es- 
timator and the optimal estimator constructed in this pa- 
per is indeed small, th erefore consistent with the work of 
ISeo fc Eisensteinl (|2007h . In this paper, we aim at improving 
on the accuracy of that method, and present a first signif- 
icant detection of the effect of non-Gaussian errors on the 
measurement of the BAO scale (the effect of non-linear evo- 
lution, in the form of erasure of the wiggle sig nal, has been 
well-studied of course l|Seo fe EisensteirJl2007l i). 

Some data analyses jHiitsij 120061 ; iPercival et a 1 2010; 
Eiscnstcin et a 1 2005; Cole et a 1 2005) also improved on the 
original method by constructing a non-Gaussian estimator 
for the BAO dilation scale uncertainty. This is typically done 
by modeling the non-linearities in the density fields with 
moc k catalogs, which are produced from lo g-normal densi- 
ties i|Percival et alil2010l:IColes fc JonesHl99ll). 2 nd order per- 
turba tion theory ((Hiitsi 2006), halo models (Eisc nstein et all 
l2005h . etc. Such techniques all attempt to increase the ro- 
bustness of the analysis by taking into account the coupling 
between the Fourier modes. As mentioned above, an opti- 
mal analysis must be based on a reliable inverted covariance 
matrix, and the accuracy of the inverse matrix constructed 
from mock catalogs is yet to be demonstrated. The covari- 
ance matrix in power spectrum is a four point function that 
relates pairs of wave numbers. The error on this covariance 
consists of pairs of these pairs, and is indeed difficult to 
quantify. Without this metric, however, one does not know 
the significance of a no n-Gaussian computation. In addition, 
iTakahashi et all (|2009h have found significant departures be- 
tween the covariance matrix constructed from Lagrangian 
perturbation theory and that obtained from their 5000 sim- 
ulations. Also unknown is the accuracy of log-normal densi- 
ties at modeling the true covarian ce matrix and its inverse. 
Other analyses (|Blake et a 3 120101 ) treat the mode coupling 
as coming exclusively from the survey selection fun ction, 
following the widely used FKP (|Feldman et all I1994T ) pre- 
scription. This specific coupling effect can be reduced with 
other ch oices of power s pectru m estimators, like that pre- 
sented in lTegmark et all (2006). In both cases, however, the 
non-linear mode coupling is not modeled. 

When it comes to the impact on the BAO dilation scale, 
a Gaussian treatment of the data yields a sub-optimal esti- 
mation of the mean, and the error bars obtained that way are 
systematically biased, usually on the low side. In the limit 
where the sample is large enough, the value of the mean es- 
timated in that fashion does converge to the "true" mean, 
but the estimated error bars never capture the correlation 
that occurs in the non-linear regime. Many analyses attempt 
to correct for this bias with Monte Carlo simulations, how- 
ever this effect is very small and takes a high accuracy and 
precision to observe. 

In the non-Gaussian case, however, an inversion of the 
covariance matrix is required, hence it cannot be singular. 



Consequently, the convergence of our measured matrix de- 
pends on the binning, and the inversion increases the noise 
even more. For an adequate resolution on all the scales 
relevant for BAO analyses, the number of simulations re- 
quired to obtain statistically significant conclusions can be 
large. In the past, different groups used drast i cally differ- 
ent numbers of simulations: ISeo fc Eisensteinl d2005h used 
5ljRimes fc Hamilton! (f I)05T) used 400. and lTakahashi et all 
l|201lD used 5000; we use 1000 N-body simulations in this 
work. In order to invert &NxN covariance matrix, one needs 
at least N simulations to make the matrix non-singular. It is 
also generally thought that to achieve convergence on each 
element, we need of the order N 2 simulations. Even then, 
the level of accuracy is not clear, and to make a significant 
claim about non-Gaussian effects, one needs to know the un- 
certainty on the inverted matrix, which we measure from a 
bootstrap re-sampling, and to propagate the error onto the 
BAO scale. 

To address this convergence issue, we also apply a noise 
reduction technique before the inversion: we factorize the 
covariance matrix with an Eigenvector decomposition, and 
keep only the principal component. This factorization is re- 
peated at each of the bootstrap samplings, which allows us 
to draw robust conclusions on the convergence of our results. 

Given the fact that the precision of the inverse co- 
variance matrices used in analyses has never been demon- 
strated, the measurement of the mean and of the error on 
the mean found on the literature are most likely not optimal. 
Measuring an optimal BAO scale in actual data is compli- 
cated in many aspects, such as the fact that the Universe 
is not periodic, and that surveys have selection functions. 
It would nevertheless involve a covariance weighted mea- 
surement, which is not i ncluded in the prescription given by 
l|Seo fc Eisensteinll2007l ). If one could improve the measure- 
ment of the power spectrum covariance, however i.e. from 
N-Body simulations, it would be possible to measure a more 
robust and more accurate uncertainty on the sub-optimal 
mean, compared to the original claim. The difference in per- 
formance between these two BAO dilation scale estimator is 
still an unmeasured quantity. In this paper, we first attempt 
to address this question by comparing three different anal- 
ysis scenarios: 

(i) The first case we consider is an attempt at measur- 
ing a correct error bar on a sub-optimal mean of the BAO 
dilation scale. Even if we know that the Universe is non- 
Gaussian, it is still possible to treat it as Gaussian, i.e. not 
use an optimally weighted sum when estimating the mean, 
even though the power spectrum itself is non-linear. Doing 
so, we must keep in mind that the measurements are non- 
optimal, and that the naive Gaussian error bars are most 
likely too small compared to the "true" error. However, given 
the fact that we can measure a full covariance matrix from 
N-body simulations, we can get a better estimate of the er- 
ror bars on that sub-optimal mean by treating the original 
covariance matrix as noisy and by performing an appropri- 
ate inverse covariance weighting. From now on, we refer to 
this case as the "sub-optimal" estimator of the BAO error. 
This approach is commonly used to obtain "Monte-Carlo" 
error bars, and exactly what is measured by the forecast- 
ing prescriptions mentioned above. We call this approach 
"sub-optimal" because the errors bars could still be further 
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reduced by improving the estimator of the mean BAO scale 
using the non-Gaussian model. It provides a quick estimate 
of the magnitude of non-Gaussianity, and a simple way of 
scaling errors bars obtained from Gaussian analysis, which 
is often used in surveys. 

(ii) The second case, dubbed "optimal estimator", is the 
best quadratic analysis one can possibly do - knowing that 
the Universe is non-Gaussian, we treat it as is, measuring 
a fully non-linear power spectrum covariance matrix and 
performing an optimally weighted sum to estimate both the 
means and the uncertainties on the parameters. Given the 
fact that we rely on a large number of N-Body simulations, 
that our volume is periodic, and that we have a high signal 
to noise, then both our estimators are truly optimal in the 
least squares sense. Q 

(iii) Many times in the literature (| Blake et all l20ld : 
iTegmark etall2006l : [Percival et all[200ll ). the above case is 
modified by replacing the non-linear covariance matrix by 
a Gaussian one. This effectively treats the power spectrum 
measurements as Gaussian, even thou gh the data are cor - 
related. Under the widely used FKP (jFeldman et allll994r ) 
approximation, for example, the only source of mode cou- 
pling comes from the convolution with the survey selection 
function; it thus considers the underlying covariance matrix 
to be diagonal. For reasons mentioned above, the error bars 
obtained this way could be systematically underestimated. 

We did not address the question of shot noise, however, 
which is also non-Gaussian in nature. In the case of surveys 
that address some of the non-Gaussianities with mock cat- 
alogs, the error correct bars should lie somewhere between 
case (i) and case (ii) (while some approach case (iii)), de- 
pending on how close to optimal the measured power spec- 
trum is, and how well the cata logs model the non-linear dy- 
namics. iTakahashi et all i|201ll ) showed that the difference in 
estimator between cases (ii) and (iii) is very small (i.e., with 
optimal weighting, errors near the pure-Gaussian errors are 
achieved) , but this is not the complete story, especially when 
dealing with current Gaussian or sub-optimal data analyses. 
The question we address is the following - by how much does 
the error bars on the least robust BAO dilation scale (case 
(iii)) differ from a correct calculation based on an optimal 
covariance matrix and properly inverse covariance weighted 
(case (i))? 

Finally, we go one step further and repeat the mea- 
surements of non- Gaussian effects on reconstructed den- 
sity fields. We apply a density reconstruction algorithm 
llEisenstein et that was developed to improve the 

BAO signal at late times, which is parti ally lost due to 
non-linear coupling between Fourier mod es |Noh et "al l2009l ; 
IPadmanabhan et alll2009l ; ISeo et aHl2010l ) . Other approaches 
have been explored to recover some of the Fisher information 



1 In the case of a data analysis, however, the optimal measure- 
ment of the covariance matrix is complicated by the fact that the 
underlying matrix C(k, k') is six-dimensional, and non-isotropic 
in the sense that pairs of mode separated by smaller angles are 
more correlated. On top of that, the observed quantity is convo- 
luted in six-dimension al with pairs of survey selection function 
jHarnois-Deraps fc Penll201ll) . For these reasons, we find solu- 
tions that apply to periodic volumes, and we leave it for future 
work the extraction of optimal estimator in more complex data. 



lost in gravitational collapse dGoldberd 20001; Zhang et all 
l2010al ; lYu et alllioicl ; iNevrinck et alll2009l ; ISeo et all (201 il l. 



but it was not verified how these methods propagate to con- 
straints on cosmological parameters. The full BAO analysis 
is indeed sensitive to this intermediate stage, but in a non- 
Gaussian treatment, the interplay between the off-diagonal 
elements of the covariance matrix and the derivatives is quite 
subtle. We thus set forth to test quantitatively how this re- 
construction algorithm affects the BAO dilation error, for 
the three analysis cases mentioned above. 

The paper is organized as follows. In Section[2]we briefly 
review how BAO dilation measurements can constrain dark 
energy. In Section [3] we describe our set of N-body simula- 
tions and the reconstruction algorithm. In Section [4] we dis- 
cuss how to extract both the Gaussian and the non-Gaussian 
covariance matrices. In Section [5] we describe the Eigenvec- 
tor decomposition, while in Section [6] we present the Fisher 
matrix formalism and the estimators of the BAO dilation 
uncertainty for the three analysis cases. Finally, in Section 
[7] we present and discuss our results. 



2 BACKGROUND 

2.1 Baryon acoustic oscillations 

The matter clustering we observe today is the result of tiny 
inhomogenei ties set during inflation in the early Universe 
(Guth 2004). Over time, matter collapsed gravitationally 
into over-dense regions that eventually evolved into large 
scale structures. As long as the perturbations are small, the 
structure growth equations can be linearized. We can there- 
fore understand the evolution of inhomogeneities by consid- 
ering perturbations one at a time. 

In the early Universe, matter and photons are coupled 
together as a single fluid via Thomson scattering. Due to ra- 
diation pressure, photons begin to disperse away from over- 
dense regions, pushing the baryons alongside. Dark matter, 
on the other hand, only interacts weakly with that fluid, 
via gravity, and does not respond efficiently to the photons' 
push. The perturbation eventually grows into a state where 
the initial clump of dark matter gets surrounded by a spheri- 
cal ripple of baryon-photon fluid, which expands at the speed 
of sound c s ~ c/\/3. 

At about z ~ 1000, when photons decouple from mat- 
ter, they no longer push the baryons. The speed of sound 
in the fluid drops abruptly, and the BAO ripple stops mov- 
ing and freezes out. Eventually, dark matter also responds 
gravitationally to this over-dense region of baryons. The re- 
sult of this initial point-perturbation is a smooth clump of 
matter with a spherical shell of density enhancement at the 
sound horizon, about 150 Mpc away from the center. The 
complete final field is, by Huygens's principle, the superposi- 
tion of similar spherical ripples of density enhancement from 
the initial perturbations at all points. Therefore, we do not 
observe these spherical ripples directly, but measure them 
statistically in the mass auto-correlation function. 

Using galaxies from the Sloan Digital Sky Survey as 
tracers of the matter distribution, a n excess correlation a t 
150 Mpc apart has been observed (|Eisenstein et a 3 120051 1. 
Although the BAO in the correlation function is intuitive, 
we are interested in the BAO power spectrum P(k) which 
is related to £(r) by a transformation 
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dk. 



(1) 



The correlation peak in real space is manifested as a se- 
ries of wiggles in Fourier space. This oscillatory feature 
is even more robust ag ainst observational contaminations 
l|Seo fc Eisensteinl 120031 ) . 



2.2 Dark energy constraints 

The accelerating expansion of the Universe is widely blamed 
on dark energy, a mysterious entity which contributes to 
more than 70% of the energy content in the Universe. Dark 
energy can be described by an equation of state 
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WC p 
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relating its pressure P and density p. A common parame- 
terization of w is 



j(z) = Wo + 



W\Z 

l + z 



(3) 



where wo is the value in the present day. The Friedmann 
equation now reads 

Hi [fi m (l + zf + 0.(1 + zf + fi fc (l + zf 



H\z) 



+Sl\ exp ( 3 



1 + w(z) 
l + z 



dz 



(4) 



where the terms on the right hand side are the energy con- 
tributions of matter, radiation, curvature, and dark energy 
respectively. For an object of a fixed co-moving size s, its 
projections across and along the line of sight are given by 

cAz 

s \\ = 



s± 



H(z) 
(l + z)D A (z)A6 



(5) 
(6) 



respectively. Az and AO are the redshift span and angular 
size of the object on the sky, and 

dz' 



D A (z) = 



l + zj H(z') 



(7) 



is the angular diameter distance to its center. If this object 
has a fixed co-moving size, it then acts as a standard ruler, 
where measurements of Az and A8 give estimates for H and 
Da, respectively. This, in turn, provides constraints for w(z) 
via Equation U given an adequate redshift sampling. 

In this paper, we restrict ourselves to an idealized 
isotropic universe with no observational distortions, such 
that the standard ruler is given by the sound horizon, which 
has size s — sm = s±. Therefore, the fractional errors on 
these quantities can be used to constrain H(z) and Da(z). 
To estimate the error on s, we construct a Fisher matrix 
that propagates the correlated uncertainty measured in our 
simulated power spectra. 



3 SIMULATIONS AND DENSITY 
RECONSTRUCTION 

We run a total of 1000 simulations using a particle-particle- 
particle-mesh (P 3 M) N-body code CUBEP3^0, the successor 



to pmfast |Merz et a]||2005l ). Each simulation has N = 256 3 
dark matter particles in a periodic cube of L — 600 /i -1 Mpc 
on one side. The initial condition of each simulation at 
2 = 100 is produced by a Gaussian random field, which 
is characterized by an initial transfer function generated by 
C AME0. For this work we use the following cosmological pa- 
rameters: n m = 0.279, fl b = 0.044, 0\ = 0.721, h = 0.701, 
n s = 0.96, as = 0.817. We output the particle positions and 
velocities at redshifts 0.5, 1.0 and 2.0, and then the particles 
are assigned to a densit y field £(x) on a 512 3 grid u sing the 
cloud-in-cell algorithm (|Hocknev fc Eastwood! Il980l ). 

To understand the density reconstruction algorithm, it 
is helpful to look at the process by which initial conditions 
are generated in a typical N-body simulation. A Gaussian 
random field <5(k) in Fourier space can be constructed by 
generating a field of Gaussian random numbers whose vari- 
ance is determined by an i nput power spect rum. Under the 
Zel'Dovich approximation (|ZerDovicb| [l970;') . we then com- 
pute a displacement field 



s ( k ) = -7T^( k ) 



(8) 



which, when Fourier transformed back into real space, can 
be applied to displace a uniformly distributed set of par- 
ticles. In addition, the density field allows us to calculate 
the gravitational potential, whose gradient gives the parti- 
cles' initial velocities. We solve for these initial conditions 
at z = 100, where our simulations begin. 

The reco n struc tion algorithm developed by 
lEisenstein et all (|2007l ) essentially uses the <5(x) output 
to calculate a displacement field, and subtracts the dis- 
placements from the particle positions. This is indeed very 
similar to the procedure that generates initial conditions 
described previously, except that the displacements are 
subtracted from the particles' positions, instead of added. 
T he algorithm is t he following, as was neatly summarized 
bv lNoh et all l|2009l ). 

(1) Calculate the density field <5(x) using particle posi- 
tions, and then transform it into Fourier space S(k). 

(2) Calculate the displacement field s(k) in Fourier space 
using the Zel'Dovich approximation, where 



s(k) = - * *(k)F(k), 



(9) 



and -F(k) = exp[— (kR) 2 /2] is a smoothing function of scale 
R. 

(3) Transform the displacement field back into real space. 
Subtract this displacement from the positions of the simu- 
lation particles and calculate the new density field <5<j(x). 

(4) Repeat the previous step, but applying the displace- 
ment field onto a set of uniformly distributed particles in- 
stead of simulation output. Calculate the density field <5«(x) 
from these displaced particles. 

(5) The "reconstructed" density field <5'(x) is given by the 
difference between the above density fields: 



<5'(x) = <S d (x) - 5«(x). 



(10) 



The correlation functions f(r) before and after recon- 
struction (using smoothing scale R = 10 /t -1 Mpc) are shown 



2 http: //www.cita.utoronto.ca/mediawiki/index.php/CubePM 



3 http://camb.info/ 
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Figure 1. Correlation function at redshifts z = 0.5 and z = 
2.0, as well as reconstruction (smoothing scale R = 10 ft - 1 Mpc) 
at z = 0.5. The functions are rescaled by the square of the growth 
factors so that the acoustic peak locations match the z = 0.5 case. 



in Figure[T] In Sections l6.3l and[7l we quantify the effect that 
reconstruction has on distance error estimates. 
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4 POWER SPECTRUM ANALYSIS 
4.1 Matter power spectrum 

In the presence of anisotropy (eg. redshift distortions), the 
power spectrum P(k, fx) takes in an angular dependence fi = 
cos 6. In an isotropic universe, however, it is only a function 
of the scale and is denned as 



P(k) = (\S(k)\ 2 ) 



(11) 



where the angled brackets denote the average of all k modes 
such that |k| = k. To obtain P(k) from our N-body simu- 
lations, we assign particles onto a density grid <5(x), Fourier 
transform it into S(k), and take the averages over the spher- 
ical shells of radius k in Fourier space, using the nearest- 
grid-point scheme. 

Because our goal is to measure both a covariance matrix 
and a Fisher matrix from a finite number of realizations, the 
binning must be chosen carefully. On one hand, we need to 
be maximally sensitive to the BAO signal, hence it is impor- 
tant to resolve many wiggles. Otherwise, our analysis would 
under-sample the rapidly oscillating signal and our results 
would be less robust. On the other hand, we need to limit 
the number of matrix elements to address the issue of con- 
vergence. We thus opt for mixed binning, which is linear for 
k < 0.5 Mpc _1 /i (Afc ~ 0.01 Mpc _1 /ij3 and logarithmic for 
larger k modes (Alogfc = 0.062) . We end up with 54 bins, 
which translates into 2916 matrix elements, and we resolve 
the first seven BAO wiggles. Figure [2] shows the average 
P(k) over 1000 simulations. 



4 The linear bin width is not exactly 2tt/L = 0.0105 Mpc _1 /i 
where L is the box size because the bins have been corrected for 
averaging with the nearest-grid-point scheme. This correction is 
significant for small k where the number of modes that contributes 
to the average is small. 



Figure 2. Dark matter power spectrum at z = 0.5 produced 
by averaging 1000 simulations. Top panel: The simulated dimen- 
sionless power spectrum A 2 (fc) = P(fc)fc 3 /27r 2 compared against 
calculations by CAMB, where "linear" is from the linear ACDM 
theory, and "nonlinear" is based on HALOFIT l|Smith et alll2003l l 
numerical calculations. Bottom panel: The simulation power spec- 
trum divided by linear theory and HALOFIT power spectra. The 
first 47 modes of our spectrum are binned linearly, while the oth- 
ers are binned logarithmically. The error bars in each plot are the 
errors on the mean. We note the few percent discrepancy between 
the output and linear theory on large scales, which is caused by 
overly large time steps taken by the simulator at high redshifts, 
and will be corrected in further simulations. However we have ran 
convergence test on the starting redshift of the covariance matrix 
and found that a starting redshift of z = 100 produces the most 
accurate matrix both at high and low redshifts. 



4.2 Covariance matrix 

Given our series of P(k) measurements, we calculate the 
covariance matrix between each data point as 

1 n 

C(k, k') = -i- J2 [«(*) ~ (P(k))] [Pi(k') - (P(k')}] (12) 

where Pi(k) is the power spectrum of the ith simulation, 
and n = 1000 is the number of simulations we have. 

The power spectrum covariance matrix is best visual- 
ized as a cross-correlation coefficient matrix 



P(k,k') = 



C(k,k') 



y/C{k, k)C(k',k') 



(13) 



This definition normalizes the covariance matrix by the diag- 
onal components so that they are all unity. This is consistent 
with the fact that a given k mode is always perfectly corre- 
lated with itself. Therefore, p(k,k') ranges from —1 (perfect 
anti-correlation) to +1 (perfect correlation), where is no 
correlation at all. 
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Figure [3] shows the correlation matrix corresponding to 
the 1000 power spectra shown in Figure [2] We observe that 
the largest fc-modes in the simulation box exhibits an anti- 
correlation of about 20% with the small scales, which them- 
selves are highly correlated. The systematic anti-correlation 
has very little impact on the final measured quantities. We 
find that neither cutting off these three fundamental modes 
nor suppressing those negative covariance values to zero 
at late redshifts affect our results. This is not surprising 
because, as we show in the following sections, the lowest- 
fc modes contain very little Fisher information, while the 
largest-fc modes usually do not contribute much to our dis- 
tance errors. Therefore, the presence of this anti-correlation 
has no influence on our conclusions. 

In the three analysis cases studied in this paper, we work 
with two kinds of covariance matrices. The non-Gaussian 
covariance matrices are computed using Equation 1121 while 
the Gaussian covariance matrices (|Tegmarklll997h are given 
by 

, 2P(fc)P(fc') 
C 3 {k, k ) = — S kk , (14) 

where N k is the number of modes that contributed to the 
k bin (counting the real and imaginary parts separately), 
and P(k) can be calculated from linear theory. Indeed the 
Kronecker-delta symbol S kk i ensures that the Gaussian co- 
variance matrix is diagonal, and assumes that all the k 
modes are uncorrelated. In a real survey, when only one 
Universe can be observed, it is much harder to calculate the 
correlation between different k modes. As shown in Figure 
[3] the actual covariance matrix is clearly not diagonal as we 
approach the non-linear regime. Furthermore, as shown in 
Figure SI even the diagonal components of the non-Gaussian 
covariance matrix are not Gaussian. The Fourier modes on 
small scales are coupled both with neighboring scales, and 
across different directions on a given scale. In this context, 
the best estimate of the true covariance matrix is obtained 
from an ensemble of N-body simulations, and the result- 
ing non-Gaussian treatment of the data is our best shot 
at estimating both the mean and the error on the mean. 
The replacement of the non-Gaussian covariance matrix by 
a Gaussian one is often used as a shortcut, but one should 
keep in mind that the results in general may not be reliable. 

4.3 Fisher information function 



1.0 



As p reviously considered by I Rimes fe Hamilton! |2005, 
I2006I ). a useful quantity that can be derived from the covari- 
ance matrix is the cumulative Fisher information function 

^fe) = y ] C norm (k, k ) (15) 



where C, 
words, 



t, is the j x j sub-matrix of C(k, k') (or C g ) with 
kj, further normalized by {P(k)){P(k')) . In other 



a 



l (k,k') = 



C(k,k') 



(fc, fc' < kj only). (16) 



<p(fc))(P(fc')) 

Here I(k) describes the information contained in the ampli- 
tude of the power spectrum up to a scale k. Although the 
form of I(k) seems like a mathematical co ntrivance, it ind eed 
has physical significance in weak lensing |Lu et a J l2010l ). 



0.1 




10.8 



]0.6 



0.4 



0.2 



-0.2 



0.1 

k[h Mpc" 1 ] 

Figure 3. Correlation matrix (Equation ll3l) at z = 0.5 produced 
by averaging 1000 simulation power spectra as shown in Figure[2] 
The dark points on the diagonal and at the large-fc regime indicate 
positive correlation, as we expected. As described in Section [5] 
this figure shows a binning independent version of the correlation 
matrix. 



Figure [5] shows the Fisher information function using 
our set of simulations. It also compares the information 
that can be obtained by a Gaussian and a non-Gaussian 
covariance matrix. At small k, gravity is still linear, so 
the non-Gaussian information is similar to the Gaussian 
case, as also seen in Figure EH At fc > 0.3 Mpc - ft, how- 
ever, the information flattens out into a so-called "trans- 
linear plateau" , wh i ch is c onsis tent with previous stud ies by 
I Rimes fc Hamilton! ([2005 ) and lTakahashi et all i|201ll ). This 
corresponds to the scale where the fc modes become corre- 
lated. 

Qualitatively, the Fisher information function tells us 
how much information can be extracted from a given fc 
mode. At small fc, each mode is independent, so more in- 
formation can be extracted by measuring more modes. At 
large fc, the Fisher information function flattens, meaning 
very little information can be extracted by measuring ex- 
tra modes. Indeed, as seen from Figure [5] one can extract 
orders of magnitude more information when using a Gaus- 
sian covariance matrix instead of the true covariance matrix. 
This information, however, does not exist in the data. Conse- 
quently, an analysis using a Gaussian estimator would most 
likely underestimate the error bars for any measurements 
beyond the trans-linear scale (fc > 0.3 Mpc _1 /i). 

Before proceeding to the construction of the Fisher ma- 
trix and of the three estimators, we reduce the noise that 
exists in our covariance matrices, as described in the follow- 
ing section. 



5 NOISE REDUCTION 

As mentioned in the introduction, to achieve convergence 
on the N 2 elements of the covariance matrix is not an easy 
task. Among the possible approaches, the brute force way 
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Figure 4. The ratio between the diagonal components of the 
non-Gaussian {Co) and Gaussian ((C g )u) covariance matrices, 
at z = 0.5. For C g we evaluated in 2 ways. Using Equation 1 141 
the P(k) can either be averaged over our set of simulations, or 
simply computed from linear theory using CAMB. For either case, 
this plot shows that even the diagonal components of the non- 
Gaussian covariance matrix are not Gaussian, as the ratio clearly 
departs from unity with higher k. The sharp increase at k ~ 
0.5 Mpc _1 /i is caused by the logarithmic binning, in which each 
bin suddenly accumulates the departure contributions from many 
more Fourier modes. As we will justify in the results discussion 
later, we do not consider modes where k > 0.5 Mpc - h. 



(i.e. running a very large number of N-body simulations) 
is probably the least affordable in terms of computational 
resources, fn this section, we apply a noise reduction tech- 
nique that reduces to 2N the number of elements one needs 
to extract. 

Our technique is based on the assumption that the 
cross-correlation coefficient matrix (see Figure [3]) can be pa- 
rameterized as the sum of a diagonal and off-diagonal com- 
ponents, and we put a strong prior on the latter: it must be 
expressed as a sum over the outer products of the principal 
Eigenvector^. The diagonal component is then simply ob- 
tained such as each diagonal element in the final matrix is 
equal to 1. 

To extract the Eigenvectors, we perform an iterative 
eigenvalue decomposition of the off-diagonal component and 
keep only the dominant terms. We start by modeling the 
full matrix as the sum of an identity matrix 5 fe fe / and an 
off-diagonal component, from which we extract the largest 
eigenvalue A and Eigenvector U\(k). At the next iteration 
step, we model the diagonal component as (1 — XUx(k)), 
subtract that quantity from the original matrix, and extract 
the principal Eigenvector of the result. We repeat this pro- 
cess until the difference between the original and the model 
converges - four times in this case. At the end of the iterative 
step i, the parameterized matrix is thus given by 



5 Because the off-diagonal elements are monotonically increasing 
both towards higher k- and fc'-modes, the number of significant 
Eigenvectors is expected to be small. In contrast, a matrix which 
would be strong when close to the diagonal, but decreasing when 
moving away, would require a larger number of vectors. 
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Figure 5. Top panel: Cumulative information fEquation ll5H per 
unit volume at Z = 0.5. The filled circles were obtained by using 
both the reconstructed power spectrum and the reconstructed co- 
variance matrix in Equation ll6l Bottom panel: Same plots as the 
top, but with different binning schemes to show that the plateau 
of infor mation is not a binning art i fact. Similar plots can also be 
found in lRimes fc Hamilton! fcOOSl l2006h . 



P i (k,k , ) = 5 k ,y (l-A^vrW) +yui(k)ui(k') (n) 

After the last iteration, we update the first term with the 
latest A and U\(k). The diagonal component typically starts 
off as unity in the largest scales and fades away in the non- 
linear regime. In this case, the strongest eigenvalue is almost 
two orders of magnitude larger tha n all others, hence we 
keep only a single Eigenvector (see (|Harnois-Deraps fc Pen! 
2011) for the application of this method when the angular 
dependence of the covariance matrix is preserved). 

The modeled covariance matrix is then recovered from 
this noise-reduced cross-correlation matrix with the diago- 
nal elements of the original covariance C(k, k) (not to be 
confused with the diagonal component of the factorization 
of p(k, fc')) following Equation 1131 Although we do need N 2 
matrix elements to start with, the power of this decomposi- 
tion relies on the fact that the prior is accurate at the few 
percent level, and that everything that does not fit into this 
form is considered as noise. We thus recover smooth and ac- 
curate covariance matrices, even though the input elements 
are noisy. The explanation for this is that one needs only to 
measure the N diagonal elements of the covariance matrix 
(which are generally the easiest to resolve) plus one Eigen- 
vector (which combines the data of a large portion of the 
matrix into another N elements). 

We present the fractional error between the modeled 
and the original matrices in Figure [6j and observe that in- 
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Figure 6. Absolute value of the fractional error between the mod- 
eled covariance matrix and the original, from unreconstructed 
density fields at z = 0.5. As expected, the lower k modes are 
more noisy due to lower statistics. We still achieve an agreement 
of about ten percent at k ~ 0.22 Mpc~ 1 h, and even better for 
smaller scales. 
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Figure 7. The ratio of the diagonals to the Gaussian approxima- 
tion. The squares are from the original matrix, the open circles are 
from the diagonal component of the factorization, and the stars 
are from the diagonal of the off-diagonal contribution. The diago- 
nal component is closer to the Gaussian prediction down to much 
smaller scales, as most of the non-Gaussianities are factorized in 
the other term. 

dividual elements match the model at the ten percent level 
at k ~ 0.2 Mpc _1 /i. This improves to the few percent level 
for even smaller scales. We expect these higher k modes to 
be more accurately measured to start with, since they come 
from the angle averaging of fc-shells that contain much larger 
number of cells. 

We then present in Figure [7] the diagonals of the orig- 
inal covariance matrix and of both components, divided by 
the Gaussian prediction. The sum of the two components 
is equal to the original matrix on the diagonal by construc- 
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Figure 8. The ratio between the Fisher information of our mod- 
eled covariance matrix and that of the original, from unrecon- 
structed density fields at z = 0.5. The departure from the original 
is at most 10%, and the agreement is otherwise at the few percent 
level. 

tion. The off-diagonal components of the factorization can 
be tested against the original by comparing the cumulative 
Fisher informations (Equation 1 1 5 p . We observe that the in- 
formation from the factorized covariance matrices exhibit 
the same essential features, namely a Gaussian increase up 
to about k ~ 0.2 Mpc _1 /i, followed by a saturation plateau. 
We present the ratio of this modeled information to the orig- 
inal distribution in Figure [5] and observe that the factor- 
ization reproduces the information content at the few per- 
cent level. Although the figures presented in this section 
correspond to the unreconstructed densities at z — 0.5, we 
achieve comparable performances on reconstructed density 
fields at all measured redshifts. 

This factorization has one extra advantage, which comes 
from the fact that U (k)y / C(k, k) is binning independent. 
The original covariance matrix is binning dependent, since 
the number of modes entering each element varies as we 
change from linear to logarithmic bands, for example. This 
is correct, but it has a major inconvenience - the cross- 
correlation coefficient matrix (Equation ll2[) visually changes 
significantly. It becomes a tedious task to compare figures 
from different authors. We can therefore attempt to fix this 
problem by constructing binning independent quantities. 

The diagonal component, 1 — XU 2 (k), is close to Gaus- 
sian, as seen in Figure [7] hence it is roughly inversely pro- 
portional to the number of modes in the bin. We scale it 
with the ratio between the measured number of modes and 
the continuous limit case: N cont ~ 4/37r(fc/fc m i n ) 3 , and ob- 
tain a binning independent quantity. We have also shown in 
this section that the off-diagonal elements of the covariance 
matrix are well modeled by \U(k)U(k')^/C(k, k)C{k', k'), 
which are binning independent as well. The solution is thus 
to replace the diagonal of the original covariance matrix such 
that 

C(k,k) -s- [1 - \U 2 {k)} — — + XU 2 {k). (18) 

This bin independent result is made available thanks to 
the factorization presented above, which allows us to iso- 
late the bin dependence (that comes exclusively from the 
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diagonal component) and to apply our correction. The cross- 
correlation coefficient matrix calc ulated that way is shown i n 
FigureE] and compares well with lRimes fc Hamiltonll|2005h . 
apart from the anti-correlation between the largest and the 
smallest modes (see s ectio n [4~2l an d note that it is not clear 
that the simulations Rimes fc Hamilton! {2005) were large 
enough to contradict our anti-correlation result). 



of the sinusoidal components in the Fisher matrix as a con- 
stant. In our work, since we use non-Gaussian (hence non- 
diagonal) covariance matrices, we do not adapt that con- 
stant approximation, as the detailed structure of the sig- 
nal becomes important once off-diagonal errors are present. 
Nevertheless, it is interesting to show what effect this ap- 
proximation and the off-diagonal elements of the covariance 
matrix can have on our results. See Appendix A for this 
discussion. 



6 PARAMETER ESTIMATION 
6.1 Optimal estimator 

For a set of parameters pi , the Fisher matrix F-y is given by 
the inverse of a covariance matrix 



(19) 



where dj = C{ki,kj) or C 9 , as described in the previous 
sections. Since our dj is derived from power spectra, the 
parameters in Fij can be thought of as the power for each 
k mode in the power spectrum. To obtain the Fisher matrix 
for another set of parameters q^, we can project the power 
spectrum Fisher matrix onto the new parameter space using 



dP{h) dP(kj) 



(20) 



Using -Fa/3, the optimal error estimator for parameter 
simply 



F 



-l\l/2 



(21) 



The estimators for cases (ii) and (iii) are obtained from sub- 
stituting Equations If 21 and If 41 respectively in Equation 1 191 
For our purposes, since we are interested in estimat- 
ing the fractional errors of s (ID) or s± and sii (2D), we 
set q = Ins in fD, and qi = lnsx and §2 = In sn in 2D. 
The derivatives in Equation [20] can be evaluated in a num- 
ber of ways. We can either produce a variety of P(k) using 
different parameters s and take their finite differences, or 
we can parametrize P(k) as a function of s, and evaluate 
the derivatives analytically. In both cases, we decompose 
the power spectrum into a smooth component and a wiggly 
component 



P{k) — Psmooth(k) + P w iggle 



(22) 



When using BAO to measure distances, all information is 
manifested as "wiggles" in the power spectrum. To ensure 
that all our measurements originat e from BAO, we subtract 

1 smooth 

(k) jEisenstein fc Hullf 9981 ) from our full P(k) prior 
to taking derivatives. For the rest of this paper, we refer 
Pmi gg ie(k) and dP m iggi e /dx to simply P(k) and dP/dx for 
brevity. 

The finite differencing of P(k) is done by producing 
many P(k) using CMBFASlQ, and dividing their differ- 
ences by the differenc es of s being used . On th e other hand, 
readers familiar with ISeo k, E iscnstcin l|2007l ) would recall 
that they suggested an analytical form of the wiggles using 
a damped sync function, and they approximated the square 



http: / /cmbfast.org/ 



6.2 Sub-optimal estimator 

In a data analysis, non-linear covariance matrices are usu- 
ally hard to measure with a high signal to noise, especially 
in surveys that exhibit complex selection functions. What is 
often done in that case is to assume Gaussianity in the data, 
while ignoring the fact that the band powers themselves are 
actually non-Gaussian. The values extracted for the mean 
and the error on the BAO dilation scale are not properly 
weighted, since they assume that all errors in k— bands are 
uncorrelated. As mentioned earlier, the resulting mean is 
sub-optimal, while the error bars are most likely off by an 
unknown amount. Since we now have measured a non-linear 
covariance matrix C, we are in a position to compare the 
correct error bars for existing data analysis (case (i)) with 
those quoted in literature, which approach case (iii) at vari- 
ous levels. We recall that the difference between sub-optimal 
and optimal is somewhat reduced for analyses that model 
the non-linearities in the fields. 

We derive a "sub-optimal" estimator by solving the lin- 
ear system Ax — b where x is a vector containing a set of 
cosmological parameters of interest (here we consider only 
x = A Ins, hence a; is a scalar, but this method could be 
generalized to include lnsx and lnsn for instance), and b is 
a noisy observable, associated with a noisy covariance ma- 
trix C. In our case, the observable under study is AP(k), 
the deviation from the mean of the power spectrum. With 
this correspondence, we get A = dP(k)/dlns - a vector in 
our case. To estimate each component of x, we first weight 
each observed point by the inverse of the covariance matrix 
associated with the observation of 6, i.e. C, and then proceed 
to solve for x by taking pseudo-inverses such that 

x = {A T C~ 1 A)- 1 A T C- 1 b. (23) 

Finally, the errors on the elements of x are given by the 
diagonal components of the covariance matrix {x 2 } = xx T . 
We obtain the following estimator for the error in Ins: 

((Alns) 2 ) = (P^C- 1 P a r 1 PlC' 1 CC- 1 P a (PlC~ 1 Ps)- 1 (24) 

where P s = dP/dlns, and PjCP a is a vector-matrix- 
vector product similar to Equation 1201 and where C = bb T 
is the improved estimate of the non-linear covariance ma- 
trix, which we obtained from our simulations. Notice that 
if the true covariance matrix was indeed Gaussian (i.e. 
C = C — C g ), we would recover the optimal estimator 
where ((Alns) 2 ) = (PjC^P,) -1 = F^ 1 . In other words, 
cases (i), (ii), and (iii) would be identical. Conversely, if the 
original matrix was already the optimal measurement (i.e. 
C = C), we would get ((Alns) 2 ) = (PjC _1 P s ) _1 , i.e. cases 
(i) and case (ii) would be the same, possibly different from 
case (iii). We also note that the inverse of the true covariance 
matrix is not involved in Equation 1241 
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In the next sections, we consider the case where Gaus- 
sianity was originally assumed in a BAO analysis, we cor- 
rectly estimate the error of x (with this sub-optimal esti- 
mator) and compare the results produced with an optimal 
estimator. 



6.3 Effects of reconstruction 

As seen in Figure [TJ density reconstruction sharpens the 
acoustic peak in the correlation function. This is equivalent 
to reducing the non-linear damping of the wiggles in the 
power spectrum. We parametrize the reduction of damping 
due to reconstructio n by an extra factor in front of S n ; in 
the damping factor l)Seo fc Eisensteinll2007h such that 





K Z-'nl 




exp 


2 


— > exp 



(25) 



where / = 1 represents 100% reconstruction, canceling any 
non-linear effects. In reality, such a case is unachievable, 
as some information has been irreversibly lost. In princi- 
ple, we could measure / by extracting the power spectrum 
wiggles before and after reconstruction, and find the best 
fit damping factor. Looking at Figure [1] we see that the 
reconstructed correlation at z = 0.5 is very similar to the 
correlation at z = 2. The ratio of growth factors between 
these two redshifts is 0.55, and the reconstructed field actu- 
ally looks slightly better than the z = 2 field, so we adopt 
the s tandard / = 0.5, which reduces non-linear damping by 
50% (|Seo fc Eisensteinll2007l ; fMasui et aJll2010h . 



7 RESULTS AND DISCUSSION 

We set up hypothetical surveys with volume V = 
(1 /i _1 Gpc) 3 centered on redshifts z = 0.5 and z = 1.0. 
We then produce fractional distance errors As/s using the 
three aforementioned estimators: 



• Sub-optimal estimator (Equation fJH 

• Optimal estimator (Equation I21[) with a non-Gaussian 
covariance matrix (Equation ll2|) . 

• Optimal estimator (Equation I2ip with a Gaussian co- 
variance matrix (Equation 1 14|) . 

The distance error measurements use only the information 
up to a limiting k max in the covariance matrix; all fc > k max 
are marginalized over (i.e. cut off from the covariance ma- 
trix before it is inverted). For each error estimate, we also 
produce bootstrap error bars to show the convergence of our 
set of 1000 simulations. This is done by picking 1000 random 
simulations (allowing repeats) from our set, and taking the 
standard deviations of the results using 2000 such random 
sets. The noise reduction technique is performed for every 
set. 

The only redshift dependence in the estimator comes 
from the non-linear damping scale £„;(z) (Equation I25|) . In 
addition, we know from Figures [3] and [5] that the covariance 
matrices in the linear regime are similar to the Gaussian co- 
variance matrix. Therefore, we expect errors at small k to 
be hardly distinguishable among estimators and redshifts. 
At large k, however, we expect the effects of the covariance 
matrices and the derivatives to have a more pronounced 
effect on the estimator. The redshift dependent damping 



scale T, n i(z) becomes important and distinguishes between 
z — 0.5 and z — 1.0. 

The top panels of Figures [5] and QJJ] show the measured 
distance errors vs limiting k max , with and without recon- 
struction, respectively. The bottom panels shows the squares 
of the ratio of cases (i) and (ii) to case (iii) from the top 
plot at z — 0.5. In all scenarios, we consider only the modes 
k < 0.5 Mpc -1 ft where our simulations remain reliable (Fig- 
ure]!?]). Moreover, we do not expect modes of fc > 0.5 Mpc _1 ft 
to carry any BAO informa tion, as the wiggles on s mall scales 
suffer from Silk damping (|Eisenstein fc Hu| [l998) as well as 
non- linear damping (Section 16. 3p . 

Figure [TU] shows the distance errors with / = 0.5 and 
covariance matrices from reconstructed simulations. Since 
some of the wiggles are indeed recovered, the optimal dis- 
tance errors decreased by 50% to 70% compared to the 
case with no reconstruction (Figure [9|. More importantly, 
though, the discrepancy between sub-optimal and optimal 
estimates becomes slightly more severe. In the presence of 
reconstruction, a naive Gaussian assumption can underesti- 
mate the variance of the errors by up to 20% near the trans- 
linear scales (fc ~ 0.2 Mpc _1 /i). Somewhat disturbingly, at 
small scales k > 0.5 Mpc _1 /i the sub-optimal estimate devi- 
ate significantly from the optimal estimate. A sub-optimal 
estimator certainly is not expected to provide the same re- 
sult as an optimal estimator. And also, as mentioned above, 
the regime fc > 0.5 Mpc _1 /t is irrelevant to BAO analysis. 
Therefore, we did not investigate this behavior further. 

Our results from Figure [9] show that the three treat- 
ments of the errors have similar constraining power on the 
BAO dilation scale. This co nclusion is consistent with that 
from pTakahashi et all l|201ll ). which measured that cases (ii) 
and (iii) give almost identical results, without reconstruc- 
tion. However, we push the envelope further and show that, 
first, the sub-optimal estimator also behaves similarly, with 
deviations by up to 15%. Second, we show that the recon- 
struction of density fields improves the constraints on dila- 
tion by about 70% at z = 0.5. The improvement is more 
modest at higher redshifts, where additional BAO peaks are 
still in the linear regime. Third, we conclude that with re- 
construction, Gaussian assumption can underestimate the 
errors on the dilation scale in a similar level to the case 
without reconstruction. 



8 CONCLUSION 

We have addressed some aspects of the problem of measuring 
non-Gaussian error bars in a BAO analysis. We have investi- 
gated the full optimal quadratic non- Gaussian estimator on 
reconstructed density field, and quantified the significance 
of the non-Gaussianities on the BAO dilation scale error. 
A major subtlety is that the optimal estimator requires an 
accurate measurement of the inverse of the covariance ma- 
trix of power spectrum, and the actual uncertainty on that 
inverse had never been measured. The accuracy of the in- 
verse is generally bin dependent, and convergence requires 
a large number of N-body simulations. We have overcome 
this problem with a factorization technique that involves an 
iterative eigenvalue decomposition of the covariance matrix, 
which we measured from 1000 N-body simulations. We fur- 
ther measured the uncertainty of the inverse of the matrix 
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Figure 9. Top panel: Fractional distance errors using optimal and 
sub-optimal estimators, without reconstruction. Optimal (Gauss) 
and Optimal (non Gauss) denote Fisher estimators from covari- 
ance matrices C g and C respectively. Sub- optimal denotes the cor- 
rect non-Gaussian errors if Gaussianity was assumed. The error 
bars on each line are the 1-a bootstrap error bars for the statisti- 
cal fluctuations in our set of simulations. Middle panel: Optimal 
non-Gaussian and sub-optimal errors, each divided by the opti- 
mal Gaussian errors at z = 0.5. This panel shows that each error 
estimate deviates by only a few percent from the Gaussian esti- 
mate. Bottom panel: Our dP/ds template which shows that the 
regime k > 0.5 Mpc - x /i is no longer relevant for BAO analysis, 
as the wiggles are mostly damped and converged to zero. 



with bootstrap re-sampling, and were able to achieve con- 
vergence at the percent level. 

Having confidence in the inverse matrix, we quantita- 
tively compared the measurement of the error on the BAO 
dilation factor obtained with different estimators. We con- 
struct an optimal estimator, which gives the most accurate 
measurement of the error achievable on the BAO dilation 
scale. It is derived from a covariance weighted Fisher ma- 
trix, which is constructed out of the inverse of the non-linear 
power spectrum covariance matrix. We compared our results 
with those obtained from the purely Gaussian forecast, and 
we measure significant discrepancies of up to ten percent in 
the error on the dilation scale. 



Figure 10. Fractional distance errors after reconstruction. Com- 
pared to Figure[9] this now includes reconstructed covariance ma- 
trices and a reconstruction factor of / = 0.5. In the sub-optimal 
case, the increase at the regime k > 0.5 Mpc - 1 h is a statistical 
curiosity, but irrelevant for a BAO analysis. We base our con- 
clusion on the values for k < 0.5 Mpc" 1 /!. This cut-off coincides 
with our choice to switch from linear binning to logarithmic bin- 
ning, but we have checked that the above plots contain no binning 
artifacts. 

We also measured non- Gaussian error bars on the mean 
BAO dilation scale that has been obtained with a Gaussian 
estimator, as is usually encountered in the literature. Be- 
cause we have confidence in the accuracy of our covariance 
matrix, this sub-optimal estimator provides a robust esti- 
mate of the error bars on the BAO dilation scale. To illus- 
trate our point, we considered the case where the original 
dilation scale was measured under standard Gaussian statis- 
tics. We found that the variances of those measurements can 
differ by up to 15%, compared to our optimal estimator. 
Many data analyses did include non-Gaussianities in their 
BAO error estimator, hence the discrepancy between these 
and our optimal estimator is likely to be more modest than 
that obtained in this work. 

We note in passing that these results were entirely ob- 
tained from N-body simulations, hence the effect of the sur- 
vey selection function has been factored out of our problem. 
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Constructing optimal estimators with actual data will how- 
ever be more challenging, since it has to include such an 
effect, which effectively couple Fourier modes from different 
bins, in addition to account for the effect of bias between 
the sampled tracers (i.e. galaxies or 21cm structure) and 
the underlying matter density. 

We have also implemented a density reconstruction al- 
gorithm, which recovers some of the lost BAO information 
due to non-linear gravitational collapse at late times. In that 
case, the error on the dilation scale is reduced by a factor 
of about 70% at low redshift, but the discrepancy between 
the sub-optimal and optimal estimates remains similar to 
the case without reconstruction (20% and 15%, with and 
without reconstruction) . 

We mention in conclusion that in a survey, the increase 
in variance we observed when using a sub-optimal estimator 
is equivalent to losing about the same percentage of sur- 
vey volume, because the variance of measurements is in- 
versely proportional to volume. These discrepancies should 
be taken seriously into account especially when forecasting 
performances of future telescopes, where the objective is to 
reach percent level precision on cosmological parameters. 
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APPENDIX A: POWER SPECTRUM 
DERIVATIVE 

Interestingly even though the Fisher information (Figure [S| 
of Gaussian and non-Gaussian covariance matrices differ by 
about an order of magnitude near k > 0.2 Mpc _1 ft, the 
measurement errors (Figures 1^1 and [TTT)l differ by only a few 
percent. The reason for this is that when the oscillatory 
power spectrum derivative dP/d\ns is multiplied into the 
covariance matrix to compute the Fisher matrix (Equation 
I20[) , the off-diagonal elements of the covariance matrix can 
be can celed out. 

In lSeo &: Eisenstein! (|2007h . they considered the regime 
k > 0.05 Mpc -1 /i and approximated a cos 2 (fcs) term, which 
originates from dP/dlns, as simply 1/2 without any oscil- 
lations when computing their Fisher matrix. Since we are 
considering a similar regime, we also attempted this ap- 
proximation. We emphasize that this approximation is not 
applicable to us, as our non- Gaussian covariance matrix is 
not diagonal. Nevertheless, it illustrates the effect that an 
oscillatory dP/d In s can have on distance measurements. 

Figure \K\ shows As/s as a function of limiting k m ax, 
using the approximated derivative. This can be compared 
to the top panel in Figure [9] where the only difference is 
that Figure[X]uses the approximated dP/d\ns, and Figure[9] 




Figure Al. Fractional distance errors using various estimators, 
similar to the top panel of Figure [9] (without reconstruction). 
The only difference here is that the power spectrum derivative in 
the Fisher matrix is an approximation which does not oscillate. 
Compared to Figure[9] the Optimal (Gauss) case is similar, but 
the other cases are clearly different. 



computes dP/d\n s by finite differences of actual power spec- 
tra. In both figures, the optimal estimators follow the inverse 
of the Fisher information (Equation I15p . For the optimal 
Gaussian case, the errors using either forms of derivatives 
give similar results (other than some oscillations at small k). 
This is expected since the Gaussian covariance matrix is di- 
agonal (Equation ll4|l . so the approximation is valid. For the 
sub-optimal cases we considered in this paper, however, the 
discrepancies from the optimal Gaussian case reach factors 
of 2 to 3, depending on redshift. This can be attributed to 
the fact that all elements of the covariance matrix are given 
a uniform weight. When using the finite difference deriva- 
tives, however, different elements are weighted according to 
the values of dP/dlns at their corresponding k modes. This 
effectively cancels most of the contributions from the off- 
diagonal elements of the covariance matrix, which explains 
why the optimal and sub-optimal errors differ by only a few 
percent in that case. 
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